Running Test & Viewing Report
In this section, you will learn how to select an AI model to test and validate your dataset version.
Testing the AI Model
To run test and view reports, do the following:
- Click the Run Test button to start the process of testing and validating an AI model.
The AI Model is successfully submitted...
After a few minutes, the test starts running on the AI Model
After the test is successfully completed, you can view the F1, Precision, Recall scores of the test.
For a detailed report, proceed to click View Report button.
Viewing Report
In this section, you will get to know how to view the AI model performance report.
To view report, do the following:
-
Click the View Report button.
The Test Results dialog box is displayed.
Understanding the Test Report
In this section, you will know how to read and understand the AI model performance report.
Summary of Key Metrics
When you run a model test in the platform, the platform calculates the following core AI evaluation metrics. These metrics denote how well the AI Model is detecting your target objects (S300 in this case) as compared to the ground-truth dataset.
Metric | Value | Definition | Why It Matters |
---|---|---|---|
F1 Score | 95.0% | Harmonic mean of Precision and Recall.2 × (Precision × Recall) / (Precision + Recall) | Since Precision = Recall, F1 is equal to both. Indicates balanced performance with no trade-off. |
Precision | 95.0% | Correctly predicted positive cases out of all predicted positives.TP / (TP + FP) | 95% of predictions labeled as positive (s300 ) were actually correct → very few false alarms. |
Recall | 95.0% | Correctly predicted positive cases out of all actual positives.TP / (TP + FN) | 95% of actual s300 instances were detected → only one or two were missed. |
Confusion Matrix – Plot
Summarizes classification performance using a confusion matrix across multiple categories.
Predicted: s300 | Predicted: background | Predicted: *background | |
---|---|---|---|
Actual: s300 | 17 (True Positive - TP) | 0 (False Negative - FN) | 1 (FN / misclassification) |
Actual: background | 0 | 0 | 0 |
Actual: *background | 1 (False Positive - FP) | 0 | 0 |
Interpretation
- 17 TP: Model correctly identified 17 instances of
s300
. - 1 FN: One
s300
was missed and classified as*background
. - 1 FP: A background instance was incorrectly labeled as
s300
. - Balanced performance: Precision, Recall, and F1 Score are all 95%, showing that the model performs consistently across detection and error handling.
💡 Use Case: Identify both misclassifications and missed detections while confirming balanced model performance.
Confusion Matrix – Visualize
Provides a geospatial visualization of actual vs predicted detections on imagery.
Element | Description |
---|---|
Dataset List | Choose files from the dataset (e.g., ZIP archives) to view in the map. |
Satellite Map | Overlays actual annotations (red) and model predictions (blue). |
Category Toggle | Enable or disable categories such as s300 and *background . |
Confidence Slider | Adjust threshold to filter predictions based on model confidence scores. |
💡 Use Case: Helps users visually inspect and confirm whether detections align correctly with ground truth objects on the imagery.
Dataset & Category Score
Displays evaluation metrics per dataset and per category, enabling fine-grained analysis.
Score Table
Dataset | F1 Score | Precision | Recall |
---|---|---|---|
Version 1 | 95.0% | 95.0% | 95.0% |
└── s300 | 95.0% | 95.0% | 95.0% |
└── background | 95.0% | 95.0% | 95.0% |
└── *background | 95.0% | 95.0% | 95.0% |
Interpretation
- All three metrics (F1, Precision, Recall) are identical (95%), indicating balanced model behavior.
- The performance is consistent across classes, suggesting the model handles positives (
s300
) and negatives (background classes) without bias.
💡 Use Case: Use this tab to confirm balanced accuracy across multiple categories, ensuring the model is equally reliable in detecting and rejecting classes.